Chapter 6

Taking All Kinds of Samples

IN THIS CHAPTER

Grasping the concept of statistical error

Setting up your sampling frame

Executing a sampling strategy

Sampling — or taking a sample — is an important concept in statistics. As described in Chapter 3, the

purpose of taking a sample — or a group of individuals from a population — and measuring just the

sample is so that you do not have to conduct a census and measure the whole population. Instead, you

can measure just the sample and use statistical approaches to make inferences about the whole, which

is called inferential statistics. You can estimate a measurement of the entire population, which is

called a parameter, by calculating a statistic from your sample.

Some samples do a better job than others at representing the population from which they are drawn.

We begin this chapter by digging more deeply into some important concepts related to sampling. We

then describe specific sampling approaches and discuss their pros and cons.

Making Forgivable (and Non-Forgivable) Errors

A central concept in statistics is that of error. In statistics, the term error sometimes means what you

think it means — that a mistake has been made. In those cases, the statistician should take steps to

avoid the error. But other times in statistics, the term error refers to a phenomenon that is unavoidable,

and as statisticians, we just have to cope with it.

For example, imagine that you had a list of all the patients of a particular clinic and their current ages.

Suppose that you calculated the average age of the patients on your list, and your answer was 43.7

years. That would be a population parameter. Now, let’s say you took a random sample of 20 patients

from that list and calculated the mean age of the sample, which would be a sample statistic. Do you

think you would get exactly 43.7 years? Although it is certainly possible, in all likelihood, the mean of

your sample — the statistic — would be a different number than the mean of your population — the

parameter. The fact that most of the time a sample statistic is not equal to the population parameter is

called sampling error. Sampling error is unavoidable, and as statisticians, we are forced to accept it.

Now, to describe the other type of error, let’s add some drama. Suppose that when you went to take a

sample of those 20 patients, you spilled coffee on the list so you could not read some of the names. The

names blotted out by the coffee were therefore ineligible to be selected for your sample. This is unfair

to the names under the coffee stain — they have a zero probability of being selected for your sample,

even though they are part of the population from which you are sampling. This is called

undercoverage, and is considered a type of non-sampling error. Non-sampling error is essentially a

mistake. It is where something goes wrong during sampling that you should try to avoid. And unlike